miMeta 0.1.0
miMeta is a R package and it implements the new methods for meta-analysis of microbiome association studies that respect the unique features of microbiome data such as compositionality. This tutorial gives examples of applying a meta-analysis method called Melody [1] for microbial signature selection. Melody first generates summary statistics for each study and then uses the summary statistics across studies to select microbial signatures.
Install package from GitHub.
if(!require("miMeta", quietly = TRUE)){
devtools::install_github("ZjpWei/miMeta")
}
Load the packages.
library("miMeta")
library("tidyverse")
There are two approaches miMeta can run meta-analysis.
Approach #1: If the individual-level sample data across all participating studies are available, the function melody can take these data as input to generate, harmonize, and combine summary association statistics across studies for accurate and robust identification of microbial signatures.
Approach #2: If individual-level samples of each study are not in a central place, summary statistics can be generated for each study separately using the functions melody.null.model and melody.get.summary. The summary statistics can be transported to a central place to be harmonized and combined for signature selection using the function melody.meta.summary.
We demonstrate Approach #1 in this session and Approach #2 in the next session.
Here we use the datasets from two metagenomics studies of colorectal cancer (CRC) [2] to demonstrate the use of each function. The CRC_abd is a list of sample-by-feature matrices of relative abundance counts of 267 species under order Clostridiales from the two studies. The CRC_meta is a data frame including the sample-level variables from these two studies. In particular, the following variables are in the CRC_meta data:
Sample identity: “Sample_ID”
Study name: “Study”
Disease status: “Group”
data("CRC_data")
CRC_abd <- CRC_data$CRC_abd
CRC_meta <- CRC_data$CRC_meta
# Prepare input data
rel.abd <- list()
covariate.interest <- list()
for(d in c("FR-CRC", "DE-CRC")){
rel.abd[[d]] <- CRC_abd[CRC_meta$Sample_ID[CRC_meta$Study == d],]
disease <- as.numeric(CRC_meta$Group[CRC_meta$Study == d] == "CRC")
names(disease) <- CRC_meta$Sample_ID[CRC_meta$Study == d]
covariate.interest[[d]] <- data.frame(disease = disease)
}
refs <- c("Coprococcus catus [ref_mOTU_v2_4874]", "Coprococcus catus [ref_mOTU_v2_4874]")
names(refs) <- c("FR-CRC", "DE-CRC")
meta.result <- melody(rel.abd = rel.abd, covariate.interest = covariate.interest,
ref = refs,
verbose = TRUE)
## ++ Construct summary statistics. ++
## ++ 7 cores are using for generating summary statistics. ++
## ++ Search for the best model for covariate of interest disease. ++
The first figure above shows the number of features shared among studies (here, all 266 species are present in both studies), and the second figure shows the meta-coefficient estimates for the selected microbial signatures for CRC.
The following shows 20 coefficient estimates of the microbial features in the best model (the features with non-zero coefficients are the selected signatures):
head(data.frame(coef = meta.result$disease$coef), n = 20)
## coef
## Anaerostipes caccae [ref_mOTU_v2_1381] 0.000000
## Anaerostipes hadrus [ref_mOTU_v2_1309] 0.000000
## Anaerotruncus colihominis [ref_mOTU_v2_0884] 0.000000
## Blautia hydrogenotrophica [ref_mOTU_v2_4324] 0.000000
## Blautia obeum [ref_mOTU_v2_4202] 0.000000
## Blautia obeum [ref_mOTU_v2_4719] 0.000000
## Blautia producta [ref_mOTU_v2_4020] 0.000000
## Blautia sp. KLE 1732 [ref_mOTU_v2_0859] 0.000000
## Blautia wexlerae [ref_mOTU_v2_0466] 0.000000
## butyrate-producing bacterium SS3/4 [ref_mOTU_v2_3825] 0.000000
## Butyrivibrio crossotus [ref_mOTU_v2_4383] 0.000000
## Clostridiales bacterium VE202-14 [ref_mOTU_v2_2689] 0.000000
## Clostridium asparagiforme [ref_mOTU_v2_4394] 0.000000
## Clostridium boltae/clostridioforme [ref_mOTU_v2_0886] 1.288155
## Clostridium citroniae [ref_mOTU_v2_4882] 0.000000
## Clostridium clostridioforme [ref_mOTU_v2_0979] 0.000000
## Clostridium leptum [ref_mOTU_v2_4234] 0.000000
## Clostridium saccharolyticum [ref_mOTU_v2_1380] 0.000000
## Clostridium scindens [ref_mOTU_v2_0883] 0.000000
## Clostridium sp. AT4 [meta_mOTU_v2_7263] 0.000000
By default, Melody only outputs the information about the best model. If users want to see the information about all models on the search path of subset size, set output.best.one = FALSE. See help(melody) for more details.
If the datasets of different studies live in different locations and cannot be conveniently shared among studies, we can first generate summary statistics for each study:
# Generate summary statistics for each study
null.obj.FR <- melody.null.model(rel.abd = rel.abd["FR-CRC"], ref = "Coprococcus catus [ref_mOTU_v2_4874]")
summary.stats.FR <- melody.get.summary(null.obj = null.obj.FR,
covariate.interest = covariate.interest["FR-CRC"])
null.obj.DE <- melody.null.model(rel.abd = rel.abd["DE-CRC"], ref = "Coprococcus catus [ref_mOTU_v2_4874]")
summary.stats.DE <- melody.get.summary(null.obj = null.obj.DE,
covariate.interest = covariate.interest["DE-CRC"])
These summary statistics can be transported to a central location for meta-analysis:
# Concatenate summary statistics
summary.stats.all <- c(summary.stats.FR, summary.stats.DE)
# Meta-analysis to harmonize and combine summary statistics across studies
meta.result.2 <- melody.meta.summary(summary.stats = summary.stats.all, verbose = TRUE)
## ++ Search for the best model for covariate of interest disease. ++
The meta-analysis results generated this way are identical to those generated in the previous session.
Microbiome association studies can involve a large number of covariates of interest (e.g., omics variables). We show here how to use Melody to meta-analyze eight microbiome-metabolome association studies [3]. In these eight studies, we have 101 genera and 450 metabolites and we are interested in identifying genera that are associated with individual metabolites. This analysis takes approximately 20 minutes.
# Load metabolite data:
# You may see the following website on how to directly load data from
# github into R https://github.com/ZjpWei/Melody/raw/main/Metabolite.rda
load(file = url("https://github.com/ZjpWei/Melody/raw/main/Metabolite.rda"))
# get null model
null.obj <- melody.null.model(rel.abd = otu_data_lst, covariate.adjust = covariates_adjust_lst)
# Get summary statistics
summary.stats <- melody.get.summary(null.obj = null.obj, covariate.interest = cmpd_data_lst, cluster = cluster_data_lst)
# Meta-analysis
meta.scan.result <- melody.meta.summary(summary.stats = summary.stats, verbose = TRUE)
The following shows 20 coefficient estimates of the microbial features for 5 metabolites.
selected.num <- sort(unlist(lapply(meta.scan.result, function(d){sum(d$coef!=0)})), decreasing = TRUE)
top.cov.name <- names(selected.num)[1:min(5, length(selected.num))]
coef_mat <- sapply(meta.scan.result[top.cov.name], function(d){d$coef}, simplify = TRUE)
rownames(coef_mat) <- gsub(".*;g__","g__",rownames(coef_mat))
head(coef_mat, n = 20)
## HMDB0000619 HMDB0000784 HMDB0000792 HMDB0001414
## g__Bifidobacterium 0.00000000 0.0000000 0.00000000 0.2046077
## g__Collinsella 0.00000000 0.0000000 0.00000000 0.1342986
## g__Eggerthella 0.00000000 -0.2419357 0.00000000 0.0000000
## g__Alloprevotella 0.00000000 0.0000000 0.00000000 0.0000000
## g__Bacteroides 0.00000000 0.0000000 0.00000000 0.0000000
## g__Paraprevotella 0.00000000 0.0000000 0.00000000 0.0000000
## g__Phocaeicola 0.08871979 -0.1552934 -0.07284483 0.0000000
## g__Prevotella 0.00000000 0.0000000 0.00000000 0.2051289
## g__Barnesiella -0.24735219 0.2729527 0.22388989 -0.4460692
## g__Butyricimonas -0.17262615 0.1942097 0.15589275 -0.2477723
## g__Odoribacter -0.22885465 0.2634940 0.20652624 -0.3814099
## g__Alistipes -0.27715610 0.2733781 0.27562120 -0.4752315
## g__Parabacteroides 0.00000000 0.0000000 0.00000000 0.0000000
## g__Cryptobacteroides -0.20011184 0.4709572 0.47265564 0.0000000
## g__Bilophila 0.00000000 0.0000000 0.00000000 0.0000000
## g__PeH17 -0.59568855 0.5947508 0.48459578 -0.4989612
## g__UBA11524 -0.35651505 0.6711283 0.45386173 -0.4060579
## g__Clostridium 0.00000000 -0.3549921 0.00000000 0.4078590
## g__Anaerotignum -0.11342554 -0.1249551 0.00000000 0.0000000
## g__Acetatifactor -0.21742750 0.2807143 0.25324867 -0.2835345
## HMDB0000623
## g__Bifidobacterium 0.00000000
## g__Collinsella 0.00000000
## g__Eggerthella -0.17954156
## g__Alloprevotella 0.00000000
## g__Bacteroides 0.00000000
## g__Paraprevotella 0.00000000
## g__Phocaeicola -0.08679264
## g__Prevotella 0.00000000
## g__Barnesiella 0.20827256
## g__Butyricimonas 0.10238865
## g__Odoribacter 0.17299922
## g__Alistipes 0.19532668
## g__Parabacteroides 0.00000000
## g__Cryptobacteroides 0.25024966
## g__Bilophila 0.00000000
## g__PeH17 0.43766993
## g__UBA11524 0.44191083
## g__Clostridium 0.00000000
## g__Anaerotignum 0.00000000
## g__Acetatifactor 0.29886506
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.4
## [5] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
## [9] ggplot2_3.5.0 tidyverse_2.0.0 miMeta_0.1.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3
## [4] stringi_1.7.12 enrichwith_0.3.1 lattice_0.21-8
## [7] hms_1.1.3 digest_0.6.35 magrittr_2.0.3
## [10] timechange_0.2.0 evaluate_0.21 grid_4.3.1
## [13] brglm2_0.9.2 bookdown_0.35 iterators_1.0.14
## [16] fastmap_1.1.1 foreach_1.5.2 doParallel_1.0.17
## [19] plyr_1.8.9 jsonlite_1.8.8 Matrix_1.6-0
## [22] nnet_7.3-19 gridExtra_2.3 BiocManager_1.30.21.1
## [25] fansi_1.0.6 scales_1.3.0 UpSetR_1.4.0
## [28] codetools_0.2-19 numDeriv_2016.8-1.1 jquerylib_0.1.4
## [31] cli_3.6.2 rlang_1.1.3 munsell_0.5.1
## [34] withr_3.0.0 cachem_1.0.8 yaml_2.3.7
## [37] tools_4.3.1 parallel_4.3.1 tzdb_0.4.0
## [40] colorspace_2.1-0 abess_0.4.8 vctrs_0.6.5
## [43] R6_2.5.1 lifecycle_1.0.4 MASS_7.3-60
## [46] pkgconfig_2.0.3 pillar_1.9.0 bslib_0.7.0
## [49] gtable_0.3.4 glue_1.7.0 Rcpp_1.0.12
## [52] highr_0.10 xfun_0.39 tidyselect_1.2.1
## [55] rstudioapi_0.15.0 knitr_1.43 farver_2.1.1
## [58] htmltools_0.5.8.1 labeling_0.4.3 rmarkdown_2.23
## [61] compiler_4.3.1
Wei Z, Chen G, Tang ZZ. Melody identifies generalizable microbial signatures in microbiome association meta-analysis. Submitted.
Wirbel, Jakob et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.
Muller, E., Algavi, Y.M. & Borenstein, E. The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis. npj Biofilms Microbiomes 8, 79 (2022).